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interval  that  Chebyshev's  inequality  induces  for  small  jt  and 
which  avoids  the  error  of  approximation  that  assuming  normality 
induces.  The  paper  also  presents  an  analogous  development  for 
deriving  a  100«(1-d)  confidence  interval  for  a  proportion. 
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Introduction 


Let  i . . . ,X  be  independent  identically  distributed  random 

variables  with  u  =  EX.,  pr(OSX.SI)  *  1  and  X  »  (X+...+X  )/n. 

11  n  i  n 

This  paper  describes  a  method  for  deriving  a  100*(1-a)  interval 
estimate  of  p  for  finite  n  based  on  the  probability  inequality 
(Hoeffding  1963,  Thm .  1,  (2.1)) 

pr(Xn-p2e)  S  enf (e,w)  (1  ) 

where 

f(e,p)  =  (e+p)[ln  p-ln ( p  +e)]+( 1 -e - p ) Cl n ( 1  - p )- In ( 1  - p-e ) ]  e  <  1-p  (2) 
and 

lim  f  (  e  ,  p )  »  In  p. 
e-»  1  -  p 

To  put  our  results  in  perspective,  we  first  re.iew  the 
derivation  of  two  commonly  encountered  confidence  intervals.  Let 

k(x,8,m)  »  {x  +  62/2m  +  B[82/4m2  +  x(1-x)/m]/2}/(1+B2/m) 

OSxSI,  m«=1,2 .  (3) 

-V  -  -V 

Then  the  interval  (ktX^.-a  2,n),  k(Xn»a  2,n))  covers  p  with  con¬ 
fidence  coefficient  >  1-a,  as  a  consequence  of  Chebyshev’s  inequa¬ 
lity  and  the  observation  that  var  X ^ *E ( X ^ - p ) 2 -E[ X  ( X  - p )  ]  S  p(1-p). 
Moreover,  as  a  result  of  the  central  limit  theorem,  the  interval 
(k(Xn,-$_1(1-a/2),n),  k(Xn,*-1(1-a/2),n))  asymptotically 

(n-**)  covers  p  with  confidence  coefficient  2  1-a,  where 

7  2 

$  '(0)  *  {y:  (2  ir )  |  e  Z  /2  dz-0 }  . 


Although  the  Chebyshev  interval  holds  for  all  n,  the  width 


of  the  resulting  interval  is  considerably  larger  than  that  for 

-V  - 1 

the  asymptotic  normal  one.  For  example  a  */♦  (1-o/2)  -  2.28  for 

a-. 05.  However,  because  of  the  nonuniform  convergence  of 

y 

( Xn- u ) / [ u ( 1  * u ) / n ] 2,  using  the  normal  confidence  interval  obliges 

one  to  account  for  the  inevitable  error  of  approximation  for 

finite  n.  This  error  makes  difficult  an  assessment  of  whether  or 

not  the  associated  confidence  coefficient  truly  exceeds  1  -  a ,  and 

can  be  especially  bothersome  in  a  Monte  Carlo  sampling  experiment 

where  the  problem  dictates  the  maximal  interval  width  and  the 

minimal  acceptable  confidence  level.  Even  less  appealing  are 

interval  estimates  of  the  form  ( £  -  6  (  S2/n  ) X  +  B  (  S2/n  ) /j)  where 

n  n  n  n 


Sn  - 


n 

l 

i  -1 


(Xi*Xn)‘ 


_y  - 1 

and  6  «  a  2  and  B  *  <t>  (1-a/2)  for  the  Chebyshev  and  normal  cases 

respectively.  Although  intended  to  shorten  the  intervals  by 

2 

using  the  additional  information  in  S  2  X  (1-X  )  the  substitu- 

n  n  n 

2 

tion  of  Sn  for  var  X^  induces  an  additional  error  of  approxima¬ 
tion  in  assessing  whether  or  not  the  resulting  confidence 
coefficient  exceeds  l-o. 

Hoeffdlng  (1963)  derived  the  probability  inequality  (1)  for 
all  bounded  X^  Prev'ously  Okamoto  had  derived  (2)  for  X.  with 
the  Bernoulli  distribution,  pr(Xt»0)  »  1-y  and  pr(Xt-1)  -  u,  a 
result  implicit  in  Chernoff  (1952,  Thm.  1  and  Ex.  5).  Theorem  1 


provides  the  basis  for  constructing  a  confidence  interval  for  y 
based  on  Hoeffding's  theorem. 

Theorem  1  .  Let  X  . . Xn  be  i.i.d.  random  variables  with 

y  -  EXi  ,  pr(0£Xi£1)  -  1  and  A  »  max  [pr(X^«0),  pr(Xj_*1)].  Then 
for  n£ln ( a/2 ) /in  A ,  (  V  1  ( Xn  ,  a/ 2 )  ,  4,2(Xn,a/2))  covers  y  with  probabi¬ 
lity  >  1-a  where  ^.(X  ,a/2)  £  X  £  't'  „  ( X  ,a/2)  are  the  solutions  to 

in  n  2  n 

f  (X  -f.f  )  =  -  In ( a/2 ) .  (4) 

n  n 

Proof .  Observe  that 

df(e,y)/ae  »  1 n [ y ( 1  - y- e ) / ( y +  e ) ( 1  - y  )  ]  <  0  0<e<1-y, 

f  (  0  ,  y  )  -  0 

and  recall  that 

lim  f  (  e  ,  y  )  ■  In  y  <  0. 

e-H  -  u 

Therefore,  is  monotone  decreasing  in  e  with  maximum  1  and 

minimum  un  . 

Consider  the  equal  tail  probability  case  and  let 

e ( u , a/2 )  -  { e :  enf ( e ’ u )  -a/2  }  if  Un  £  a/2 

-  1-y  if  yn  >  a/2. 

Then 


pr [X  2h( u, a/2)  ]  £  a/2 


(5) 


where 


h(p,a/2)  =  p  +  e(p,a/2). 

For  pn  >  a/2,  pr(Xn^1)  £  An  £  a/2.  For  pn  £  a/2,  we  want  to  find 
the  set  of  all  p's  satisfying  (5).  From  _  a/2, 

de(p,a/2)/dp  =  -|l+e/p(1-p)  In [ p ( 1  - p-e ) / ( 1 -p ) ( u  +  e ) ] }  (6) 

so  that 

dh ( p , a/ 2 ) / d  p  >  0, 

implying  that  h  is  monotone  increasing  in  p.  Therefore,  the  set 
of  p's  of  interest  is  {p:  0< pSf ^ ( Xn , a/ 2 ) }  where 

'P1(x,a/2)  -  |f:  f  +  e  (f  ,  a/2)  *x  }  , 

which  is  precisely  the  solution  to  (  ^ )  in  the  interval  [0,XnJ. 
Consequent  1 y , 

pr[f 1 (X  ,a/2)2p]  S  a/2 

so  that 

pr [ f 1  (XR , a/2 ) < p  ]  >  1-a/2, 
as  required. 

The  upper  bound  fgCX  ,a/2)  follows  analogously,  using 


wrtoaaMft  immmm  :-&*>&£>.  «SSCS«#®  miZ&.'l 
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Observe  that  if  X.  has  a  continuous  distribution,  A  =  0  and 

1 

Theorem  1  holds  for  all  sample  sizes  n.  If  pr(aSX^Sb)  =  1, 

then  Theorem  1  holds  with  1 00* (  1  -  a)  confidence  interval 

((b-a)  f,((X  -a)/ (b-a) ,o/2)  +  a,  (b-a)  ».((X  -a  )/ (b-a ) , a/2) +a ) . 
in  2  n 

Although  for  Bernoulli  data  and  small  n,  one  can  compute  an  exact 
confidence  interval  for  p,  as  in  Blyth  and  Still  (1983),  this 
option  loses  its  appeal  as  n  increases  and  the  potential  for 
numerical  error  grows-  Table  1  shows  the  lower  bounds  on  n  for 
a = . 0 1  and  .05. 


Insert  Table  1  about  here 


Using  the  dominant  term  (as  n-*®)  of  the  Taylor  series  of 
f(x-4i,i|i),  one  can  readily  show  that  as  n  increases 

f,(X  ,o/2)  -  f.(X  ,o/2)  -  2  [  2  In  ( 2/a)  X  (  1  -  X  )/n]1/z.  (7) 

2  n  In  n  n 


To  order  n  2  ,  the  Chebyshev  and  normal  intervals  have  widths 

2  [  a  1  X  (1-X  )  /  n  ]  ^  and  2$-1(1-a/2)[X  (  1  -  X  )  /  nV2  respectively, 
n  n  n  n 

Table  2  compares  these  widths  for  a  «  .01  and  .05. 


Confidence  Interval  for  a  Proportion 

Let  ( X i  , Y  i  ),...,( Xn , Y n )  denote  i.i.d.  random  vectors  with 
P  x  •  EX  t  ,  p  y  -  EY  t  ,  pr(0JXtS1)  -  1,  priOSY^I)  «  1,  priY^X^  =  1, 

<t>  -  u ,,  /  uv  i  X  *  (X.+...+X  )  /  n  and  Y  *  (Y.+...+Y  )/n.  Also,  let 
X  n 


vv'i’i 


-6- 


r(x,y,g,m)  =  |xy  +  82/m  +  6[82Mm2+y(x-y)/m]/2l/(x2+-B2/m) 

OSySxSI,  -»<8<®i  m  =  1,2 .  (8) 

Then  (r(X  ,Y  ,-8,n),  r(X  ,Y  ,8,n))  with  6  =  a  ^  covers  </>  with  con- 
n  n  n  n 

fidence  coefficient  >  1-a,  and  with  8  =  4>" 1  (  1  -  a/ 2 )  asymptotically 
(n-»<»)  covers  0  with  confidence  coefficient  1-a.  These  results 
again  follow  from  Chebyshev's  inequality,  the  central  limit 
theorem  and  the  observation  that 

var  (  Y  ^  -  <)>X  )  =  var  ( Y  0X  ^  +4> )  £  4>  (  1  -  <p )  . 

Again,  one  can  derive  a  1 00« ( 1 -a)  confidence  interval, 
shorter  than  the  one  that  Chebyshev's  inequality  offers  for  small 
a  and  that  avoids  the  error  of  approximation  that  assuming 
normality  induces.  Let 

Wi  =  Y1  ‘  4>Xl  *  ♦ 

so  that  p  =  EW.  and  pr(0SW.£1)  =  1.  Then  for  W  =  (W  +...+W  )/n, 
11  n  1  n 

(1)  applies  in  the  form 

pr  ( Y  -  4>  X  >e)  =  pr(W  -$2e)  <  enf(e,!,>).  (9) 

n  n  n 

This  establishes  the  basis  for  Theorem  2. 

Theorem  2  .  Let  ( X y  , Y i  ),...,( X n  ,  Y n )  be  i.i.d.  random  vectors 

with  ux  =  EX  t  ,  u  -  EYj,  pr(OSX^I)  *  1,  prCOSY^I)  *  1, 

pr(YtSXj)  =1,  X  »  max r  or  ( Y  ^  =0  )  ,  prCY^I)],  Xn  =  (X  1  ♦  .  .  . +Xn  )  /  n  , 

Y  «  (Y+...+Y  )/n  and  <p  =  yv/yv.  Then  for  n  >  ln(a/2)/lnX, 
n  l  n  Y  x 

(Y,(X  , Y  ,a/2),  Y_(X  ,Y  ,a/2))  covers  <J>  with  probability  >  1-a 

Inn  2  n  n 


WWW 


iirqm 


where  Y-|(x,y,0)  £  y/x  £  Y2(x,y,0),  are  the  solutions  of 

f ( y- Yx , Y)  =  -  In  0  0<y<x<1,  (10) 

f  being  defined  in  (2). 


if  4. n  <  a/2 
if  $n  >  a/2 

(11) 


where 

g(<J>,a/2,x)  =  0X  +  e($,a/2). 

For  «j>n  >  a/2, 

pr  (Yn-<fXn*1-<D)  -  pr  (  Y  n  =  1  ,  Xn  =  1) 

=  pr  (X  =1  I  Y  =1  )  pr  (Y  =1  ) 
n  1  n  n 

=  pr (Yn=1 ) 

=  An  £  a/2. 

For  <j>n  £  a/2,  we  want  to  find  the  set  of  all  $ '  s  satisfying 
(11).  Observe  that 

dg  (  $  ,  a/2  ,  x  )  /  d  =  x  +  De($,a/2)/3$ 

where  (6)  gives  3  e  (  $  ,  a/ 2  )  /  3$  .  Using  the  inequalities  z/(i+z)  < 


ln(1+z)  <  z  for  z>-1  and  z*0,  one  has 


Since  e  >  0  ,  $  S  Y  n  /  X  n  sc  that 


-  $  /  M  -  i  )  £  (  z  -  Y  )  e  /  (  1  -  e  )  (  X  -  Y  *  e  )  >  (  e  -  Y 

n  n  n  n 


and  finally 


3g($.a/2,X)/3®  >  X^  +  (e-Y  )/(1-e)  =  [X-Y  +e(1-X  )  I  /  (  1  - e 
n  n  n  n  n 


Therefore,  the  set  of  p  '  s  of  interest  is  <  a  :  0<i><Y,(X  ,Y  ,o/2 

1  n  n 


Y .  ( x , y , 9  )  =  |f:  fx+e(f,a/2)  =  y,  0<y<x<1,  0  <  0  < 1  }  , 


which  is  precisely  the  solution  to  (10).  Consequently, 


pr[Y  (Xn  ,Yn  ,a/2)2<p]  <  a/2 
I  n  n 


so  that 


pr [ $  >  Y 1  ( X  n , Y  n , o/ 2 )  ]  >  l-a/2. 


as  required.  The  upper  bound  Y2 ( Xn , Y  , a/ 2 )  follows  analogously 


using 


pr(-W*<j>2e)  <  e  n  f  (  e  ,  1  -  «J> )  ^ 


4.  A  iLq  *  "  A  •  ■ 
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Table  1 


n  q  =  min  { n :  A  n  So/ 2  } 


n  0 

a 

A 

no 

51 

•  05 

.  90 

36 

528 

.99 

368 

9 

5296 

.  999 

3688 

99 

52981 

•  9999 

36887 

Table  2 

Comparison  of  Interval  Widths 
(1-a/2)  a"1  /2/[21n(2/a)  ]V*  [  21  n  (  2/ o )  ]V*/ 


